r 

t 



ED 031 802 



£A0024?« 



DO CUM 8 M ? 88 S 0 M* 

24 

Messick’* P»p*r Entitled The Criterion Pr«We» * th» Ey^«tlc« of I«t™e«o.V 

Assessing PosstbSe, Not Just Intended Outcomes. tT . , p 

CaliforoiaU.w-. Los Angelos. Center for the Study of Ev»lo»t.on of 
Spons Agency-Office of Education (DHEW), Weshington. DC. Bureau of Reseerch. 

Report No-CS£-R*24 
Bureau No*BR*6'1646 
Pub Date May 69 

N^e-Tl^Fro^ th^oceedings of the Symposium on Problems in the Evaluation of Instruct** (Los Angela, 
December, 1967)- 

DescripttS-C^ftff ^eSepieM, .Cognitive Processes, .Evelootion Criteria. »Indivi<W Differences, 

* IW, ^IdSts ‘ d^l^de* devaluation by determining the anticipated 
behavioral outcomes, what is measured, techniques for measurement, and decision 
nakina Value judgments are usually faced only at the decision-maxing stage- Method^ 

hopefully be given more emphasis in future evaluation 

enterprises. Studies using individual difference data and those 
m^rformAnre will be necessary to determine the mixture of techniques mo^f 
aoorouriate for evaluating a specific instructional system. Studies in the devetopmeft* 
of coqnitive styles and in the conditions under which a specific style is : most 
should also use individual difference data. It is n«:essary to present 
of such studies to administrators and innovators. Related documents are tA 00 

and EA 002 473. <MLF> 



I 



o 

ERIC 



ED 031 802 







OMf.TS OH PROFESSOR MSSICK'S PAPER ENTITLED 'TIE 







ASSESSING POSSIBLE/ NOT JUST INTENDED OUTCfflES" 



SYMPOSIUM ON PROBLEMS IN THE EVALUATION OF INSTRUCTION 

University of California, Los Angeles 
December, 1967 

M. C. Wittrock, Chairman 

Sponsored by the Center for the 
Study of Evaluation 



The research and development reported herein was 
performed pursuant to a contract with the United 
States department of Healthy Education , and Wel- 
fare ^ Office of Education vnd.er the provisions of 
the Cooperative Research "Program . 



CSE Report No. 24, May, 1969 
University of California, Los Angeles 



U.5. DEPARTMENT OF HEALTH, EDUCATION & WELFARE 
OFFICE OF EDUCATION 



THIS DOCUMENT HAS BEEN REPRODUCED EXACTLY AS RECEIVED FROM THE 
PERSON OR ORGANIZATION ORIGINATING IT. POINTS OF VIEW OR OPINIONS 
STATED DO NOT NECESSARILY REPRESENT OFFICIAL OFFICE OF EDUCATION 
POSITION OR POLICY. 



Leonard Caken 



Educational Testing Service 



From the Proceedings of the 








CO-DIREOCRS 



Merlin C. WittrGck Erick L. Lineman 
ASSOCIATE DIRECTORS 

Marvin C. Alkin Frank Massey, Jr. C. Robert Pace 



The CENTER FOR TIE STUDY OF ElALUATIOM OF INSTRUCTION 
PROGRAMS is engaged in research that v/ill yield new ideas 
and new tools capable of analyzing and evaluating instruc- 
tion. Staff members are creating new ways to evaluate con- 
tent of curricula, methods of teaching and the multiple 
effects of both on students. The CENTER is unique because 
of Its access to Southern California's elementary, second- 
ary and higher schools of diverse sccio-eccnonic levels 
and cultural backgrounds- 




COMMENTS OX PROFESSOR MESSICK’S PAPER ENTITLED "BE 
CRITERION PROBLEM LX BE ElAUJATION OF INSTRUCTION: 
ASSESSING POSSIBLE, NOT JUST INTENDED OBCOMES” 

Leonard Cahen 

In his paper Samuel Messick covers many important aspects of 
evaluation, and especially emphasizes cognitive styles and affective 
reactions as they pertain to instructional research. In addition 
to the commonly assessed areas of pupil achievement, cognitive 
styles and affective reactions are suggested as further areas of 
possible assessment in evaluation studies. Among the major themes 
of Dr. Messick' s paper are the need for assessing multiple dimen- 
sions of instructional outcomes, the importance of value judgments 
in instructional systems and their evaluation, and the role that 
individual differences in cognitive styles and infoimation pro- 
cessing may play in future instructional research. 

The idea of assessing the possible and not just the intended 
outcomes raises some important issues for the evaluator. At a 
first glance the term "intended” (the counter term "unintended") 
poses a difficulty. It takes a great deal of wisdom on the part 
of the evaluator to anticipate the unintended outcomes of instruc- 
tion and to make the necessary prans for their assessment. In one 
sense the unintended outcomes may be conceived as unsought "side 
effects." In a hypothetical example a high school district adopted 
a new tenth grade science curriculum. The objectives of the 



curriculum, among others, v;ere to foster scientific thinking and to 
develop laboratory skills and an understanding of the scope of science. 
At the end of the year, the pupils performed satisfactorily on tests 
designed to measure these objectives. A negative "side effect," 
however, was seen in the fact that only a small proportion of these 
tenth grade pupils elected an eleventh grade science course the fol- 
lowing year. The proportion of these students electing the eleventh 
grade science course was significantly smaller than the proportion 
of eleventh grade students taking science courses over the preceding 
years . 

The curriculum builders and school administrators felt that 
there was a cause and effect relationship in the situation and de- 
cided tliat it was important to learn more about why the students 
generally failed to elect an eleventh grade science course. This 
negative "side effect” or unintended outcome was assessed by stu- 
dent interview. 

A second form of the unintended outcome occurs when the curri- 
culum developer explicitly attempts to develop a certain set of 
behaviors but not other behaviors. For this example let us assume 
that he is attempting tc develop behaviors A, B, and C but not D. 

In this case the intended outcomes are A, B, and C where D becomes 
explicitly stated as an unintended outcome. An example might be 
found in one of the modem mathematics curricula developed in the 



early 1960's. Behaviors A, B, and C might be represented by three 
sophisticated mathematics behaviors such as understanding c-f dif- 
ferent number systems 5 the development of heuristics in problem 
solving, and an understanding of mathematical algor ithms. Behavior 
D (the unintended outcome) might be represented by a traditional 
rnathematical skill such as accuracy in routine computations or 
competence in translating Roman numerals. In the mathematics cor 
riculum, the developer of the instructional program has in part, 
exposed his value system . 

The example of the mathematics curriculum represents a canmon 
problem that faced evaluators in the 1960's. The problem is mani- 
fested in the area of instrumentation where the instructional system 
or curriculum developer felt that standardized testing instruments 
failed to measure the dimensions he was interested in- -A, B, and C 
(his intended outcomes) - -but measured dimension D (his unintended 
outcome) with relative precision and validity. The mathematics 
curriculum suggested has led to considerable debate about the role 
of comparing outcomes across competing curricula or instructional 
systems when the competing systems have different intended outcomes. 

Michael Scriven (1966) has introduced the terms formative and 
summative evaluation. Formative evaluation is the gathering of in- 
formation in the early phases of developing a system of instruction. 
It is used for immediate feedback in modification of the materials. 




Summative evaluation provides information to the potential consigners 
of the instructional product. However, as Scriven has pointed out, 
the distinction between the two terms is not always clear. If cur- 
riculum is to be an ongoing activity, a summative evaluation mil serve 
as a first stage of a formative evaluation for the second wave of 
innovation. In the example of the mathematics curriculum developed 
above, the evaluator would be asked to provide information on dimen- 
sion D as well as dimensions A, B, and C if the evaluation were 
summative . 

The two examples or unintended outcomes are developed to show 
that an outcome may be an unsought side effect, unplanned by the 
innovator, or may reflect an a priori value judgment by the inno- 
vator to exclude certain dimensions from the instructional system. 

Dr. Messick has urged evaluators to include psychological as 
well as achievement dimensions in the evaluative act. He has pro- 
posed that, in addition to assessing the face value components of 
achievement, instructional systems must also focus on processes 
and psychological variables as outcomes . 

The issue of value judgments in evaluation cannot be over- 
emphasized. Dr. Messick has pointed out that value judgments are 
made at many phases in the development and assessment of instruc- 
tional systems. Judgments determine what the anticipated behavioral 
outcomes are, how they are to be reached, the components and con- 
structs to be measured, and the selection of instruments or techniques 



to measure or assess the components 



and constructs and, at a later 



stage, are used to reach decisions iicn the ouuCv^ne 
Too frequently value judgments, at least explicitly 



data matrix. 

, are faced only 



at the decision-making stages, if at all. 

Scriven (1966) has taken the position that the evaluator must 
play a key role in the incorporation of va-ne judgments in the 
evaluative process. This is not easy task for the curriculum 
evaluator, and because he may not represent the specific discipline 



underlying the curriculum innovation ho has felt that the judgmental 
processes must be left to the curriculum innovator v.ho does repre 
sent the field. Robert Stake (1967) has hypothesized that the 



evaluator might have ie^s access to data if he oecame idencified 
with the judging of an instructional program Stake also suggests 
the problem involved in judging the merit of a program from multi 
variate data where some of the outcomes are positive and supportive 



while other outcomes from the same program may reflect negative 



findings . 

If we are to follow Scriven f s suggestion that evaluators play 
active roles in the establishment and utilization of value judgments, 
we mil probably have to give thought to the future sources of evalua- 
tors and careful thought as to their training. In addition, the need 
for identifying methods to analyze values reflected in a program or 
instructional system (and across cornet ing instructional ^/sterns) 



will hopefully be given more emphasis in evaluation enterprises of the 
future . 

A proposal is made here that might complement methodologies in 
evaluating and assessing values in instructional research. The pro- 
posal states that outcomes at any stage of instruction can be assessed 
in terms of how well the instruction has prepared the students for 
future learning. An assumption is made here that learning is a con- 
tinuous process and that school curricula will eventually reflect a 
continuity of experiences rather than inarticulated segments of cur- 
ricula, i.e., elementary school math, junior high school math, etc. 

The success of an instructional program at any level could then be 
evaluated, in part, in terms of pupils' increased aptitudes for 
future learning. 

I would now like to turn to the problem of utilizing individual 
difference data as elements in the process of placing groups of 
students in the most appropriate learning treatment. By most 
appropriate I mean the assignment of pupils to a learning situation 
or treatment where the pupil has the highest probability of maximum 
output or achievement. Dr. Messick has carried his suggestions past 
the initial stages of evaluation to the stage of implementation. 

The model using the interaction of treatment or instruction 
and selected individual differences of learners has received a great 
deal of attention recently (Cronbach § Gleser, 1964; and Cronbach, 
1966) . While there is not always agreement about the results of such 



an interaction model, the conceptualization does form interesting 
and explicit hypotheses and requires a major change in the appli- 
cation of quantitative strategies to education. It was not too 
many years ago that behavioral scientists hoped for non -significant 
statistical interactions in their analysis of factorial designs. 

Non significant statistical findings at the interaction level 
allowed them (so they believed) to move on to the clear testing of 
major effects. Similarly, statistical textbooks frequently empha- 
sized techniques for pooling the lower order interaction mean squares 
with the error mean square so that more stable error terms would be 
available for testing main effects. This technique of pooling 
reduced type two errors at the expense of potentially destroying 
the "nuisance” relationships displayed in interactions. 

Dr. Messick has stated that interaction models may be useful 
in the examination of relationships between teacher and pupil charac- 
teristics on cognitive dimensions and in determining how these factors 
might interact to effect pupil learning. One may also wonder about 
the possible relationships between different organizations of the 
teaching act with pupil and teacher characteristics and how these 
would jointly effect pupil learning, vastly, one may consider the 
relationship of individual differences on cognitive dimensions 
(teacher and pupil) and the structuring of the content of instruc- 
tion. Might there be ways of organizing and presenting the content 



s 



of instruction so that it interacts with 



individual differences of 



pupils and teachers and teaching methods? 

The use of individual difference interaction models ni.ll re- 
quire concentrated efforts by evaluators to develop measures with 
minimal errors of measurement at the critical positions on the 
individual difference scales where decisions are made to assign 
pupils to learning experiences. 

The technique of developing evaluation instruments for reli- 
ably measuring individual differences has recently given ground to 
the development of techniques to assess and evaluate group perform- 
ance. Evaluation studies hill need to determine both the important 
research questions and wh at mixture or combination of individual 
versus group assessments reflect the most appropriate techniques 
for answering the crucial questions underlying assessment and evalua- 
tion of a specific instructional system. The item or matrix sampl- 
ing model developed by Frederic Lord (Lord § Novick, 1968) is a 
valuable technique for estimating group performance on many dimen- 
sions. Additional sampling combinations of items and subjects 
(successive matrix samplings) would provide a better estimation 
of the total covariance structure of the set of behaviors under 
investigation. However, as Dr. Messick points out, there are limi- 
tations and potential dangers in inferring performance of individuals 
fran 1 'averaged" or group assessments. This danger is probably more 



o 



severe in assessing personality dimensions than in assessing achieve 
Kent output. 

It becomes apparent to the evaluator that there is an almost 
infinite number of possible dimensions to assess and evaluate- The 
innovator-evaluator must decide which dimensions have the great 
est potential for providing infoimation for himself while also 
providing multi-dimensional cutccme measures for the potential 
consumer. Value judgments again must play an important role. Ex- 
plicit statements from the innovator-evaluator concerning priorities 
assigned to measures, and facts relating to which evaluative dimen 
sions are not included in the study are crucial. 

I would now like to consider a few problems that lie ahead in 
the utilization of individual difference measures in the cognitive 
style (non-achievement) areas in curriculum research. The study 
of individual differences in cognitive styles is in Its infancy. 

Dr. Messick encourages use of longitudinal methods to study the long 
tern interactions between achievement and such psychological pro- 
cesses as cognitive styles. It would be possible and highly desir- 
able to readminister achievement and cognitive batteries over a long 
period of time and to study the covariance patterns over time within 
and between the achievement and cognitive domains. The processes 
underlying achievement and cognitive functions may both be changing, 
thus making the analyses themselves and the understanding of the analyses 



very difficult, Within and his colleagues Catkin, Goodenough, § 

Karp, 1967) have recency reported longitudinal and cross-sectional 
data on measures of cognitive style . More research ox this nature 
will he needed if v;e are to utilize and understand cognitive styles 
and their potential for curriculum research. 

Dr. Messick has called to our attention three other aspects 
related to the role of cognitive styles and curriculum research. Ke 
has told us that school experiences should foster an increase in the 
repertoire of styles for individuals rather than increase the com- 
petencies of an individual on a limited set of styles at the expense 
of other styles. The latter possibility is an inherent danger in 
the individual difference interaction model. It might be possible 
to structure an educational experience so that groups of students 
develop or increase their cognitive abilities along one dimension 
while failing to incorporate other styles into their repertoire. 

The educator must be very careful in structuring these experiences. 

If we take the dimension of tempo outlined by Jerome Kiagan (1966), 
analytic versus impulsive styles, it is easy to let the semantics 
of analytic over impulsive determine what appears to be the obvious 
treatment- -and desirable outcome. We must learn to know under what 
conditions it is favorable for a specific student to act analytically, 
under what conditions it is best for him to act impulsively, and then 
to determine a course of instruction that will foster both. It would 



also be .important to t each die student to decide vdien one style or 
the- ether is more appropriate or beneficial. 

Dr. Messick has hypothesized that there may be some very im- 
portant stages in the development of conceptual or cognitive styles, 
possibly in the very early years, prior to the organism being exposed 
to formal education. A great deal of research vail undoubtedly be 

devoted to this area in the future. 

My final point concerns the difficulty of administering non- 
achievement batteries in the evaluation of instructional programs. 

By non-achievement I refer to measures of personality, cognitive 
style, attitude, etc. The problem of invasion of privacy must be 
considered. In addition, how do students respond to not per 

ceived as achievement measures? Students and school administrators 
vail not see the relevance of non-achievement type tests to the 
evaluation of instructional outcomes . 

We will need to convince ourselves first of the utility of 
individual differences such as cognitive style nor instructional 
research, and then help the innovator to see the value of including 
these and other process variables in the instructional "package. 



REFERENCES 



Croribach, L. J. to can instruction be adaptedto individual 
differences? In R. M. Gagne (Ed.), Learning and individual dif- 
f erences . Colunbus: Charles E. Merrill, Inc., Pp. 25-39. 

Cronbach, L. J.. f, Cleser, G. C. Psychological tests and 
personnel deci sions . (2nd ed.) Urhana: University of Illinois 

Press. 3965. 

Ka°an, J. Developmental studies in reflection and analysis. 
In A. if Kidd, f? J. L. Rivoire (Eds.) , Perceptual development jn 
children. New York : International University Press, 1966. Pp. 

487-522. 



T 3 

uUiU ^ 



» * 

r. u 



r* \T ; ^1* 

i\UV - *• 



Statistical theories of mental test 



OUV-fX CO 



Reading, Mass : Addison-V/esley, 



19: .8 



Scriven, M. The methodolog> r of evaluation. American educationa 
research association monograph series on curriculum eval uation, Ao. l. 
Chicago: Rand McNally and Co. , 1967. 



Stake, R. E. The countenance of educational evaluation. Teachers 
College Record . 1967? 68 , 523-540. 

Within, H. A., Goodenough, D. R., § Karp, S. A. Stability of 
cognitive style from childhood to young adulthood. Journa l of Perso n- 
ality and Social Psychology, 1967, 291-300. 



